智能论文笔记

Variational Stacked Local Attention Networks for Diverse Video Captioning

Tonmoay Deb , Akib Sadmanee , Kishor Kumar Bhaumik , Amin Ahsan Ali , M Ashraful Amin , A K M Mahbubur Rahman

分类：计算机视觉 | 自然语言处理

2022-01-04

在描述自然语言中的时空事件时，视频标题模型主要依赖于编码器的潜在视觉表示。 Encoder-Decoder模型的最新进展主要参加编码器特征，主要是与解码器的线性交互。然而，对视觉数据的日益增长的模型复杂性鼓励更明确的特征交互，用于微粒信息，目前在视频标题域中不存在。此外，特征聚合方法已经用于通过连接或使用线性层来揭示更丰富的视觉表示。虽然在某种程度上为视频进行了语义重叠的功能集，但这些方法导致客观不匹配和功能冗余。此外，字幕中的多样性是从几种有意义的角度表达一个事件的基本组成部分，目前缺少时间，即视频标题域。为此，我们提出了变化堆叠的本地注意网络（VSLAN），该网络（VSLAN）利用低级别的双线性汇集进行自我细分功能交互，并以折扣方式堆叠多个视频特征流。每个特征堆栈的学习属性都有助于我们所提出的多样性编码模块，然后是解码查询阶段，以便于结束到最终的不同和自然标题，而没有任何明确的属性监督。我们在语法和多样性方面评估MSVD和MSR-VTT数据集的VSLAN。 VSLAN的苹果酒得分优于当前的现成方法，分别在MSVD和MSR-VTT上的$ 4.5 \％$ 4.8 \％$。在同一数据集上，VSLAN在标题分集度量中实现了竞争力。

translated by 谷歌翻译

Rethinking Cooking State Recognition with Vision Transformers

Akib Mohammed Khan , Alif Ashrafee , Reeshoon Sayera , Shahriar Ivan , Sabbir Ahmed

分类：计算机视觉

2022-12-16

To ensure proper knowledge representation of the kitchen environment, it is vital for kitchen robots to recognize the states of the food items that are being cooked. Although the domain of object detection and recognition has been extensively studied, the task of object state classification has remained relatively unexplored. The high intra-class similarity of ingredients during different states of cooking makes the task even more challenging. Researchers have proposed adopting Deep Learning based strategies in recent times, however, they are yet to achieve high performance. In this study, we utilized the self-attention mechanism of the Vision Transformer (ViT) architecture for the Cooking State Recognition task. The proposed approach encapsulates the globally salient features from images, while also exploiting the weights learned from a larger dataset. This global attention allows the model to withstand the similarities between samples of different cooking objects, while the employment of transfer learning helps to overcome the lack of inductive bias by utilizing pretrained weights. To improve recognition accuracy, several augmentation techniques have been employed as well. Evaluation of our proposed framework on the `Cooking State Recognition Challenge Dataset' has achieved an accuracy of 94.3%, which significantly outperforms the state-of-the-art.

translated by 谷歌翻译

Discovering novel systemic biomarkers in photos of the external eye

Boris Babenko , Ilana Traynis , Christina Chen , Preeti Singh , Akib Uddin , Jorge Cuadros , Lauren P. Daskivich , April Y. Maa , Ramasamy Kim , Eugene Yu-Chuan Kang

分类：计算机视觉 | 机器学习

2022-07-19

最近显示外部眼睛照片显示出糖尿病性视网膜疾病和HBA1C升高的迹象。在本文中，我们评估外部眼睛照片是否包含有关其他系统性医疗状况的信息。我们开发了一个深度学习系统（DLS），该系统将外部眼睛的照片作为输入，并预测多个全身参数，例如与肝脏有关的参数（白蛋白，AST）；肾脏（EGFR使用无种族的2021 CKD-EPI肌酐方程，尿液ACR）；骨与矿物质（钙）;甲状腺（TSH）;和血数（HGB，WBC，血小板）。开发利用了49,015例糖尿病患者的151,237张图像，在加利福尼亚州洛杉矶县的11个地点接受糖尿病眼镜筛查。评估重点是9个预先指定的全身参数，并利用了3个验证集（a，b，c），涵盖了28,869名患有和没有糖尿病的患者，在加利福尼亚州洛杉矶县和大亚特兰大地区的3个独立地点进行了眼睛筛查。我们将结合了可用临床人口统计学变量的基线模型（例如年龄，性别，种族/种族，糖尿病年）进行了比较。相对于基线，DLS在检测AST> 36，钙<8.6，egfr <60，HGB <11，血小板<150，ACR> = 300和WBC <4时，在检测AST> 36，钙<8.6，Egfr <60，HGB <60，HGB <60，calcium <8.6，Egfr <60，calcium <8.6和wbc <4时，达到了统计学上的显着性能，并且类似于开发集的人口），其中DLS的AUC超过基线的AUC，增长了5.2-19.4％。在验证集B和C方面，与开发集相比，患者人群的差异很大，DLS的表现优于ACR> = 300的基线，而HGB <11升至7.3-13.2％。我们的发现提供了进一步的证据，表明外部眼睛照片包含跨越多器官系统的全身健康生物标志物。需要进一步的工作来研究这些生物标志物是否以及如何转化为临床影响。

translated by 谷歌翻译

Real-time Bangla License Plate Recognition System for Low Resource Video-based Applications

Alif Ashrafee , Akib Mohammed Khan , Mohammad Sabik Irbaz , MD Abdullah Al Nasim

分类：计算机视觉 | 人工智能

2021-08-18

自动许可板识别系统旨在提供从视频帧中出现的车辆检测，本地化和识别车牌字符的解决方案。但是，在现实世界中部署此类系统需要在低资源环境中实时性能。在我们的论文中，我们提出了一种双级检测管线与视觉API配对，提供实时推理速度以及始终如一的准确检测和识别性能。我们使用Haar-Cascade分类器作为骨干MobileNet SSDv2检测模型顶部的过滤器。这仅通过专注于高置信度检测并使用它们来识别来减少推理时间。我们还施加了一个时间帧分离策略，以区分同一夹子中的多个车辆牌照。此外，没有公开的Bangla许可证板数据集，我们创建了一个图像数据集和野外包含许可板的视频数据集。我们在图像数据集上培训了模型，并达到了86％的AP（0.5）得分，并在视频数据集上测试了我们的管道，并观察到合理的检测和识别性能（82.7％的检测率，60.8％OCR F1得分）具有真实 - 时间处理速度（每秒27.2帧）。

translated by 谷歌翻译

Knowledge, beliefs, attitudes and perceived risk about COVID-19 vaccine and determinants of COVID-19 vaccine acceptance in Bangladesh

Sultan Mahmud , Md. Mohsin , Ijaz Ahmed Khan , Ashraf Uddin Mian , Miah Akib Zaman

分类：人工智能

2021-03-28

A total of 605 eligible respondents took part in this survey (population size 1630046161 and required sample size 591) with an age range of 18 to 100. A large proportion of the respondents are aged less than 50 (82%) and male (62.15%). The majority of the respondents live in urban areas (60.83%). A total of 61.16% (370/605) of the respondents were willing to accept/take the COVID-19 vaccine. Among the accepted group, only 35.14% showed the willingness to take the COVID-19 vaccine immediately, while 64.86% would delay the vaccination until they are confirmed about the vaccine s efficacy and safety or COVID-19 becomes deadlier in Bangladesh. The regression results showed age, gender, location (urban/rural), level of education, income, perceived risk of being infected with COVID-19 in the future, perceived severity of infection, having previous vaccination experience after age 18, having higher knowledge about COVID-19 and vaccination were significantly associated with the acceptance of COVID-19 vaccines. The research reported a high prevalence of COVID-19 vaccine refusal and hesitancy in Bangladesh.

translated by 谷歌翻译